---
title: "Embedding priors"
author: "Tim Kerr"
format:
html:
theme:
light: sketchy
dark: darkly
code-fold: true
code-tools: true
toc: true
self-contained: true
# embed-resources: true
execute:
freeze: auto
warning: false
message: false
output: true
---
```{r packages}
### Packages
library (haven)
library (rstan)
library (loo)
library (StanHeaders)
library (tidyverse)
library (reshape2)
library (ggplot2)
library (ggExtra)
library (ggforce)
library (GGally)
library (patchwork)
library (gganimate)
library (MetBrewer)
library (ggh4x)
library (transformr)
library (data.table)
library (coda)
library (reactable)
library (kableExtra)
# library(reactablefmtr)
library (car)
library (mvtnorm)
library (ggdist)
library (tidybayes)
# library(rstatix)
library (easystats)
library (tidymodels)
library (broom)
library (pixiedust)
library (broom.mixed)
library (brms)
library (cmdstanr)
library (posterior)
library (bayesplot)
```
```{r ggplot_theme, eval=TRUE}
cp.cs <- met.brewer ("Hiroshige" )[c (1 ,10 ,2 ,8 )]
cp.cs <- append (cp.cs, met.brewer ("Cassatt2" )[c (3 ,8 )])
theme_tim <- function (base_size = 14 ) {
theme_modern (base_size = base_size) %+replace%
theme (
# L'ensemble de la figure
plot.title = element_text (size = rel (1 ), face = "bold" , margin = margin (0 ,0 ,5 ,0 ), hjust = 0 ),
# Zone où se situe le graphique
panel.grid.minor = element_blank (),
panel.grid.major = element_blank (),
panel.border = element_blank (),
# Les axes
axis.title = element_text (size = rel (0.85 ), face = "bold" ),
axis.text = element_text (size = rel (0.70 ), face = "bold" ),
axis.line = element_line (color = "black" ,),
# La légende
legend.title = element_text (size = rel (0.85 ), face = "bold" ),
legend.text = element_text (size = rel (0.70 ), face = "bold" ),
legend.key = element_rect (fill = "transparent" , colour = NA ),
legend.key.size = unit (1.5 , "lines" ),
legend.background = element_rect (fill = "transparent" , colour = NA ),
# Les étiquettes dans le cas d'un facetting
strip.background = element_rect (fill = "#17252D" , color = "#17252D" ),
strip.text = element_text (size = rel (0.85 ), face = "bold" , color = "white" , margin = margin (5 ,0 ,5 ,0 ))
)
}
# Changing the default theme
theme_set (theme_tim (base_size = 10 ))
```
```{r cmdStanr stuff}
### cmdStanr stuff
check_cmdstan_toolchain ()
# install_cmdstan(cores = 4, overwrite = TRUE)
cmdstan_path ()
cmdstan_version ()
options (
mc.cores = 4 ,
brms.threads = 4 ,
brms.backend = "cmdstanr" ,
brms.file_refit = "on_change"
)
```
```{r}
merge_prior_post <- function (gen,rec,...){
df1 <- gen %>%
gather_draws (...) %>% ungroup () %>% mutate (model = "ground truth" )
df2 <- rec %>%
gather_draws (...) %>% ungroup () %>% mutate (model = "posterior" )
df <- bind_rows (df1,df2) %>% mutate (model = as.factor (model))
return (df)
}
```
### Model spec
###### Data
Assuming a simple Bernoulli logit task / model, with individual/subject level theta determining outcomes on the task. (In reality I'm using a reinforcement learning task, but this is a simple approximation to start with).
$$ y \sim \text{BernoulliLogit}(\theta)$$
I'm assuming/testing that a psychometric measure, like the GAD-7, has an impact on this task performance. I think the impact should be bi-directional rather than additive (i.e. low GAD scores improve task score, high GAD scores mean worse task performance)
GAD is distributed roughly exponentially at population level, most have scores between 0 and 5, then a long tail up to the maximum of 21.


To facilitate the bi-directionality of effect, standardising around a mean of zero seems an OK way to start with this (would ideally estimate the fuzzy point at which the direction of effect would change).
$$\psi = \text{Exponential}(1)$$
$$z_{\psi} = \frac{\psi - \bar\psi}{\sigma_{\psi}} $$
###### Parameters
The aim then is to estimate the group level impact of GAD-7 ( $\psi$ ) on the task, via the parameter $m$, a scalar/weight between 0 and 1.
$$ p(\theta) = \text{Normal}(\Theta_{pop} + m\psi,\sigma_{pop}) $$
##### Data generator
I used Stan to generate data in one script, generating $y$ from a Bernoulli rng, and $\psi$ from an exponential rng. I inputted priors at set values to generate this data from these prior values.
$$\begin{aligned}
&\large \bf Priors \\
\Theta^{Group/pop} &\sim \text{Normal}(0.3,0.1) \\
\sigma^{Group/pop} &\sim \text{Normal}(0.1,0.1) \\
m &\sim \text{Normal}(0.3,0.01) \\
\psi &\sim \text{Exponential}(1)
\end{aligned}$$
### Running the model
##### Recovery
Then inputted the generated data into the same model, and compared ground truth to posterior/recovered parameters.
Priors were uninformative:
$$\begin{aligned}
&\large \bf Priors \\
\Theta^{Group/pop} &\sim \text{Normal}(0,1) \\
\sigma^{Group/pop} &\sim \text{Normal}(0,1) \\
m &\sim \text{Normal}(0,1) \\
\end{aligned}$$
Whilst it recovers the group level theta mean and sd parameters well, it sverely underestimates the $m$ parameter, the one of interest.
I wonder if this is due to the exponential distribution of GAD, given most people's scores will sit in the bulk of the distribution and not have much of an effect. Perhaps this score needs transforming on the log scale or suchlike to make this work...
```{r}
file <- "~/FLARe/FLARe_Modelling/Stan_experimental/Embedding_Generator.stan"
mod <- cmdstan_model (file)
mod$ check_syntax (pedantic = TRUE )
t_mult <- 1
datalist <- list (N = 100 ,
T = 10 * t_mult,
gp_theta_mu_pr = 0.3 ,
gp_theta_sigma_pr = 0.1 ,
gp_sigma_mu_pr = 0.1 ,
gp_sigma_sigma_pr = 0.1 ,
gp_m_mu_pr = 0.3 ,
gp_m_sigma_pr = 0.01 ,
psi_pr = 1
)
fit_gen <- mod$ sample (
data = datalist,
seed = 123 ,
chains = 4 ,
parallel_chains = 4 ,
refresh = 1000 ,
iter_warmup = 250 ,
iter_sampling = 1000 ,
# fixed_param = TRUE
)
theta <- fit_gen %>% spread_draws (subj_theta[n]) %>% sample_draws (ndraws = 1 , seed = 789 ) %>% pull (subj_theta)
psi <- fit_gen %>% spread_draws (psi_pred[n]) %>% sample_draws (ndraws = 1 , seed = 789 ) %>% pull (psi_pred)
y <- fit_gen %>% spread_draws (y_pred[n,t]) %>% sample_draws (ndraws = 1 , seed = 789 ) %>% pull (y_pred) %>% matrix (nrow = 10 * t_mult, ncol = 100 ) %>% t ()
file <- "~/FLARe/FLARe_Modelling/Stan_experimental/Embedding_Recovery.stan"
mod <- cmdstan_model (file)
mod$ check_syntax (pedantic = TRUE )
datalist <- list (N = 100 ,
T = 10 * t_mult,
y = y,
gp_theta_mu_pr = 0 ,
gp_theta_sigma_pr = 1 ,
gp_sigma_mu_pr = 0 ,
gp_sigma_sigma_pr = 1 ,
gp_m_mu_pr = 0 ,
gp_m_sigma_pr = 1 ,
psi_y = psi
)
fit_rec <- mod$ sample (
data = datalist,
seed = 123 ,
chains = 4 ,
parallel_chains = 4 ,
refresh = 1000 ,
iter_warmup = 250 ,
iter_sampling = 1000 ,
# fixed_param = TRUE
)
summ <- fit_gen$ summary ()
summ[1 : 4 ,] %>% kable (digits = 3 , caption = "Ground Truth" )
summ2 <- fit_rec$ summary ()
summ2[1 : 4 ,] %>% kable (digits = 3 , caption = "Recovery" )
merged_df <- merge_prior_post (fit_gen,fit_rec,gp_theta, gp_sigma,gp_m)
merged_df %>%
# filter(model == "prior") %>%
ggplot (aes (y = .variable, x = .value)) +
stat_dotsinterval (aes (group = model,
slab_colour = model,
slab_fill = model,
interval_colour = model,
point_colour = model)) +
facet_wrap (~ .variable, scales = "free_x" ) +
ggtitle ("PPC of group level parameters" )
#ppc
fit_rec %>% spread_draws (subj_theta[n]) %>% ggplot (aes (x = subj_theta)) + stat_dots (quantiles = 100 , fill = "red" , alpha = 0.5 ) + stat_dots (data = as.data.frame (theta), aes (x = theta), quantiles = 100 , fill = "blue" , alpha = 0.5 , inherit.aes = TRUE ) + ggtitle ("PPC of subject level theta parameters, Red = generated/estimated, Blue = real/ground truth" )
fit_rec %>% spread_draws (y_pred[n,t]) %>% sample_draws (ndraws = 1 , seed = 789 ) %>% ggplot (aes (x = y_pred)) + stat_dots (quantiles = 100 , fill = "red" , alpha = 0.5 ) + stat_dots (data = as.data.frame (as.vector (y)), aes (x = ` as.vector(y) ` + 0.1 ), quantiles = 100 , fill = "blue" , alpha = 0.5 , inherit.aes = TRUE ) + ggtitle ("PPC Y, Red = generated/estimated, Blue = real/ground truth" ) + ggtitle ("PPC of Bernoulli data, Red = generated/estimated, Blue = real/ground truth (shifted right to see easier)" )
```
### Changing the priors
Running the same but with more precise/informative priors
$$\begin{aligned}
&\large \bf Priors \\
\Theta^{Group/pop} &\sim \text{Normal}(0,1) \\
\sigma^{Group/pop} &\sim \text{Normal}(0,1) \\
m &\sim \text{Normal}(0.3,0.1) \\
\end{aligned}$$
```{r}
file <- "~/FLARe/FLARe_Modelling/Stan_experimental/Embedding_Recovery.stan"
mod <- cmdstan_model (file)
mod$ check_syntax (pedantic = TRUE )
datalist <- list (N = 100 ,
T = 10 * t_mult,
y = y,
gp_theta_mu_pr = 0 ,
gp_theta_sigma_pr = 1 ,
gp_sigma_mu_pr = 0 ,
gp_sigma_sigma_pr = 1 ,
gp_m_mu_pr = 0.3 ,
gp_m_sigma_pr = 0.1 ,
psi_y = psi
)
fit_rec <- mod$ sample (
data = datalist,
seed = 123 ,
chains = 4 ,
parallel_chains = 4 ,
refresh = 1000 ,
iter_warmup = 250 ,
iter_sampling = 1000 ,
# fixed_param = TRUE
)
summ <- fit_gen$ summary ()
summ[1 : 4 ,] %>% kable (digits = 3 , caption = "Ground Truth" )
summ2 <- fit_rec$ summary ()
summ2[1 : 4 ,] %>% kable (digits = 3 , caption = "Recovery" )
merged_df <- merge_prior_post (fit_gen,fit_rec,gp_theta, gp_sigma,gp_m)
merged_df %>%
# filter(model == "prior") %>%
ggplot (aes (y = .variable, x = .value)) +
stat_dotsinterval (aes (group = model,
slab_colour = model,
slab_fill = model,
interval_colour = model,
point_colour = model)) +
facet_wrap (~ .variable, scales = "free_x" ) +
ggtitle ("PPC of group level parameters" )
#ppc
fit_rec %>% spread_draws (subj_theta[n]) %>% ggplot (aes (x = subj_theta)) + stat_dots (quantiles = 100 , fill = "red" , alpha = 0.5 ) + stat_dots (data = as.data.frame (theta), aes (x = theta), quantiles = 100 , fill = "blue" , alpha = 0.5 , inherit.aes = TRUE ) + ggtitle ("PPC of subject level theta parameters, Red = generated/estimated, Blue = real/ground truth" )
fit_rec %>% spread_draws (y_pred[n,t]) %>% sample_draws (ndraws = 1 , seed = 789 ) %>% ggplot (aes (x = y_pred)) + stat_dots (quantiles = 100 , fill = "red" , alpha = 0.5 ) + stat_dots (data = as.data.frame (as.vector (y)), aes (x = ` as.vector(y) ` + 0.1 ), quantiles = 100 , fill = "blue" , alpha = 0.5 , inherit.aes = TRUE ) + ggtitle ("PPC Y, Red = generated/estimated, Blue = real/ground truth" ) + ggtitle ("PPC of Bernoulli data, Red = generated/estimated, Blue = real/ground truth (shifted right to see easier)" )
```
### Increasing the magnitude of effect
Is the effect too subtle to be picked up? Increasing the value of $m$ to 0.7...
```{r}
file <- "~/FLARe/FLARe_Modelling/Stan_experimental/Embedding_Generator.stan"
mod <- cmdstan_model (file)
mod$ check_syntax (pedantic = TRUE )
t_mult <- 1
datalist <- list (N = 100 ,
T = 10 * t_mult,
gp_theta_mu_pr = 0.3 ,
gp_theta_sigma_pr = 0.1 ,
gp_sigma_mu_pr = 0.1 ,
gp_sigma_sigma_pr = 0.1 ,
gp_m_mu_pr = 0.7 ,
gp_m_sigma_pr = 0.01 ,
psi_pr = 1
)
fit_gen <- mod$ sample (
data = datalist,
seed = 123 ,
chains = 4 ,
parallel_chains = 4 ,
refresh = 1000 ,
iter_warmup = 250 ,
iter_sampling = 1000 ,
# fixed_param = TRUE
)
theta <- fit_gen %>% spread_draws (subj_theta[n]) %>% sample_draws (ndraws = 1 , seed = 789 ) %>% pull (subj_theta)
psi <- fit_gen %>% spread_draws (psi_pred[n]) %>% sample_draws (ndraws = 1 , seed = 789 ) %>% pull (psi_pred)
y <- fit_gen %>% spread_draws (y_pred[n,t]) %>% sample_draws (ndraws = 1 , seed = 789 ) %>% pull (y_pred) %>% matrix (nrow = 10 * t_mult, ncol = 100 ) %>% t ()
file <- "~/FLARe/FLARe_Modelling/Stan_experimental/Embedding_Recovery.stan"
mod <- cmdstan_model (file)
mod$ check_syntax (pedantic = TRUE )
datalist <- list (N = 100 ,
T = 10 * t_mult,
y = y,
gp_theta_mu_pr = 0 ,
gp_theta_sigma_pr = 1 ,
gp_sigma_mu_pr = 0 ,
gp_sigma_sigma_pr = 1 ,
gp_m_mu_pr = 0 ,
gp_m_sigma_pr = 1 ,
psi_y = psi
)
fit_rec <- mod$ sample (
data = datalist,
seed = 123 ,
chains = 4 ,
parallel_chains = 4 ,
refresh = 1000 ,
iter_warmup = 250 ,
iter_sampling = 1000 ,
# fixed_param = TRUE
)
summ <- fit_gen$ summary ()
summ[1 : 4 ,] %>% kable (digits = 3 , caption = "Ground Truth" )
summ2 <- fit_rec$ summary ()
summ2[1 : 4 ,] %>% kable (digits = 3 , caption = "Recovery" )
merged_df <- merge_prior_post (fit_gen,fit_rec,gp_theta, gp_sigma,gp_m)
merged_df %>%
# filter(model == "prior") %>%
ggplot (aes (y = .variable, x = .value)) +
stat_dotsinterval (aes (group = model,
slab_colour = model,
slab_fill = model,
interval_colour = model,
point_colour = model)) +
facet_wrap (~ .variable, scales = "free_x" ) +
ggtitle ("PPC of group level parameters" )
#ppc
fit_rec %>% spread_draws (subj_theta[n]) %>% ggplot (aes (x = subj_theta)) + stat_dots (quantiles = 100 , fill = "red" , alpha = 0.5 ) + stat_dots (data = as.data.frame (theta), aes (x = theta), quantiles = 100 , fill = "blue" , alpha = 0.5 , inherit.aes = TRUE ) + ggtitle ("PPC of subject level theta parameters, Red = generated/estimated, Blue = real/ground truth" )
fit_rec %>% spread_draws (y_pred[n,t]) %>% sample_draws (ndraws = 1 , seed = 789 ) %>% ggplot (aes (x = y_pred)) + stat_dots (quantiles = 100 , fill = "red" , alpha = 0.5 ) + stat_dots (data = as.data.frame (as.vector (y)), aes (x = ` as.vector(y) ` + 0.1 ), quantiles = 100 , fill = "blue" , alpha = 0.5 , inherit.aes = TRUE ) + ggtitle ("PPC Y, Red = generated/estimated, Blue = real/ground truth" ) + ggtitle ("PPC of Bernoulli data, Red = generated/estimated, Blue = real/ground truth (shifted right to see easier)" )
```
This only makes the posterior estimates worse!
```{r}
mod$ code ()
```
<!-- ```{r} -->
<!-- #ppc -->
<!-- fit_rec %>% spread_draws(subj_theta[n]) %>% ggplot(aes(x = subj_theta)) + stat_dots(quantiles = 100, fill = "red", alpha = 0.5) + stat_dots(data = as.data.frame(theta), aes(x = theta), quantiles = 100, fill = "blue", alpha = 0.5, inherit.aes = TRUE) + ggtitle("PPC, Red = generated/estimated, Blue = real/ground truth") -->
<!-- fit_rec %>% spread_draws(y_pred[n,t]) %>% sample_draws(ndraws = 1, seed = 789) %>% ggplot(aes(x = y_pred)) + stat_dots(quantiles = 100, fill = "red", alpha = 0.5) + stat_dots(data = as.data.frame(as.vector(y)), aes(x = `as.vector(y)`+0.1), quantiles = 100, fill = "blue", alpha = 0.5, inherit.aes = TRUE) + ggtitle("PPC Y, Red = generated/estimated, Blue = real/ground truth") -->
<!-- hist(rnorm(1000,0.473,0.285)) -->
<!-- ``` -->
<!-- https://osf.io/preprints/psyarxiv/jxvs3 -->
<!-- https://www.pnas.org/doi/full/10.1073/pnas.2009641117#sec-3 -->
<!-- Per these two papers, gad was embedded into the learning rate estimation. -->
<!-- In these papers, the parameter of interest is drawn as a random variable from a normal distribution: -->
<!-- $$ p(\theta)_{pop} = \text{Normal}(\Theta_{pop},\sigma_{pop}) $$ -->
<!-- Then they add their embedding value and a scalar to modify its impact, to the population mean: -->
<!-- $$ p(\theta)_{pop} = \text{Normal}(\Theta_{pop} + m\psi,\sigma_{pop}) $$ -->
<!-- Where $\psi$ is the embedding value and $m$ is the scalar. -->
<!-- The learning rate is drawn differently, as it is calculated from the RW equation. The hierarchical estimation comes from the Phi Approx function: -->
<!-- $$ LR_s = \Phi(p_{pop}(LR_{\mu}) + p_{pop}(LR_{\sigma}) * p_s(LR_{\sigma}))$$ -->
<!-- Where $p_{pop}(LR_{\mu})$ and $p_{pop}(LR_{\sigma})$ are the population means and standard deviations of the learning rate, and $p_s(LR_{\sigma})$ is the subject specific standard deviation of the learning rate. The latter alters the width of the distribution, the former terms move the mean/mode about. -->
<!-- First this model was attempted: -->
<!-- $$ LR_s = \Phi(p_{pop}(LR_{\mu}) + (m\psi) + p_{pop}(LR_{\sigma}) * p_s(LR_{\sigma}))$$ -->
<!-- Where $\psi$ is the gad score, standardised, and $m$ is a free parameter constrained to $[0,1$ (no hierarchy informing it). -->
<!-- # Simulations -->
<!-- Model (standard) with no embedding, should look much like the real data -->
<!-- Model with embedding, go strong so the effect is obvious -->
<!-- Then see which model best fits each data -->
<!-- The standard model should fit embedded data, but not massively help the dimension -->
<!-- The embedded model should show the embedded data well, better than the standard model (reducing false negatives) -->
<!-- The embedded model should ideally not show a dimension in standard data (ie not hallucinate or create false positives) -->
<!-- Will just have three LRs and one precision per the best working model thus far. Directions on the embedding will differ for each rate. -->
<!-- ## Standard model -->
<!-- real gp_theta_mu_pr; -->
<!-- real gp_theta_sigma_pr; -->
<!-- real gp_sigma_mu_pr; -->
<!-- real gp_sigma_sigma_pr; -->
<!-- $$ p(\theta)_{pop} = \text{Normal}(\Theta_{pop},\sigma_{pop}) $$ -->
<!-- $$\begin{aligned} -->
<!-- &\large \bf Priors \\ -->
<!-- \Theta^{Group/pop} &\sim \text{Normal}(5,1) \\ -->
<!-- \sigma^{Group/pop} &\sim \text{Normal}(2,1) \\ -->
<!-- \end{aligned}$$ -->
<!-- Data was generated via Stan, with priors set as above. -->
<!-- ```{r} -->
<!-- file <- "~/FLARe/FLARe_Modelling/Stan_experimental/normal_hierarchical.stan" -->
<!-- mod <- cmdstan_model(file) -->
<!-- mod$check_syntax(pedantic = TRUE) -->
<!-- datalist <- list(N = 100, -->
<!-- theta = rnorm(100,0,1), -->
<!-- prior_only = 1, -->
<!-- gp_theta_mu_pr = 5, -->
<!-- gp_theta_sigma_pr = 1, -->
<!-- gp_sigma_mu_pr = 2, -->
<!-- gp_sigma_sigma_pr = 1 -->
<!-- ) -->
<!-- fit_gen <- mod$sample( -->
<!-- data = datalist, -->
<!-- seed = 123, -->
<!-- chains = 4, -->
<!-- parallel_chains = 4, -->
<!-- refresh = 100, -->
<!-- iter_warmup = 250, -->
<!-- iter_sampling = 1000 -->
<!-- ) -->
<!-- # fit_psi$summary() -->
<!-- fit_gen %>% spread_draws(theta_pred[n]) %>% ggplot(aes(x = theta_pred)) + stat_dots(quantiles = 100) -->
<!-- theta <- fit_gen %>% spread_draws(theta_pred[n]) %>% posterior::summarise_draws() %>% pull(median) -->
<!-- theta <- fit_gen %>% spread_draws(theta_pred[n]) %>% sample_draws(ndraws = 1, seed = 789) %>% pull(theta_pred) -->
<!-- hist(theta) -->
<!-- fit_gen$summary() -->
<!-- ``` -->
<!-- Parameter recovery from same model. The seed was changed, uninformative prior set. -->
<!-- ```{r} -->
<!-- file <- "~/FLARe/FLARe_Modelling/Stan_experimental/normal_hierarchical.stan" -->
<!-- mod <- cmdstan_model(file) -->
<!-- mod$check_syntax(pedantic = TRUE) -->
<!-- datalist <- list(N = 100, -->
<!-- theta = theta, -->
<!-- prior_only = 0, -->
<!-- gp_theta_mu_pr = 0, -->
<!-- gp_theta_sigma_pr = 1, -->
<!-- gp_sigma_mu_pr = 0, -->
<!-- gp_sigma_sigma_pr = 1 -->
<!-- ) -->
<!-- fit_rec <- mod$sample( -->
<!-- data = datalist, -->
<!-- seed = 456, -->
<!-- chains = 4, -->
<!-- parallel_chains = 4, -->
<!-- refresh = 100, -->
<!-- iter_warmup = 250, -->
<!-- iter_sampling = 1000 -->
<!-- ) -->
<!-- fit_rec %>% spread_draws(theta_pred[n]) %>% ggplot(aes(x = theta_pred)) + stat_dots(quantiles = 100, fill = "red", alpha = 0.5) + stat_dots(data = as.data.frame(theta), aes(x = theta), quantiles = 100, fill = "blue", alpha = 0.5, inherit.aes = TRUE) + ggtitle("PPC, Red = generated, Blue = real/ground truth") -->
<!-- # Merge the two dfs to show real and estimated params side by side. -->
<!-- fit_rec %>% -->
<!-- gather_draws(gp_theta, gp_sigma) %>% ungroup() %>% ggplot(aes(y = .variable, x = .value)) + stat_eye() + facet_wrap(~.variable, scales = "free_x") -->
<!-- fit_gen %>% gather_draws(gp_theta, gp_sigma) %>% ggplot(aes(y = .variable, x = .value)) + stat_eye() + facet_wrap(~.variable, scales = "free_x") -->
<!-- merge_prior_post <- function(gen,rec,...){ -->
<!-- df1 <- gen %>% -->
<!-- gather_draws(...) %>% ungroup() %>% mutate(model = "ground truth") -->
<!-- df2 <- rec %>% -->
<!-- gather_draws(...) %>% ungroup() %>% mutate(model = "posterior") -->
<!-- df <- bind_rows(df1,df2) %>% mutate(model = as.factor(model)) -->
<!-- return(df) -->
<!-- } -->
<!-- merged_df <- merge_prior_post(fit_gen,fit_rec,gp_theta, gp_sigma) -->
<!-- merged_df %>% -->
<!-- # filter(model == "prior") %>% -->
<!-- ggplot(aes(y = .variable, x = .value)) + -->
<!-- stat_dotsinterval(aes(group = model, -->
<!-- slab_colour = model, -->
<!-- slab_fill = model, -->
<!-- interval_colour = model, -->
<!-- point_colour = model)) + -->
<!-- facet_wrap(~.variable, scales = "free_x") -->
<!-- ``` -->
<!-- ## Embedded model -->
<!-- $$\psi = \text{Exponential}(0.1)$$ -->
<!-- $$ p(\theta)_{pop} = \text{Normal}(\Theta_{pop} + m\psi,\sigma_{pop}) $$ -->
<!-- real gp_theta_mu_pr; -->
<!-- real gp_theta_sigma_pr; -->
<!-- real gp_sigma_mu_pr; -->
<!-- real gp_sigma_sigma_pr; -->
<!-- real gp_m_mu_pr; -->
<!-- real gp_m_sigma_pr; -->
<!-- real psi_pr; -->
<!-- $$ p(\theta)_{pop} = \text{Normal}(\Theta_{pop},\sigma_{pop}) $$ -->
<!-- $$\begin{aligned} -->
<!-- &\large \bf Priors \\ -->
<!-- \Theta^{Group/pop} &\sim \text{Normal}(5,1) \\ -->
<!-- \sigma^{Group/pop} &\sim \text{Normal}(2,1) \\ -->
<!-- m &\sim \text{Normal}(0.3,0.1) \\ -->
<!-- \psi &\sim \text{Exponential}(0.1) -->
<!-- \end{aligned}$$ -->
<!-- Data was again generated via Stan, with priors set as above. Graphs demonstrate the difference in setting the gradient at various levels of impact. -->
<!-- ```{r} -->
<!-- file <- "~/FLARe/FLARe_Modelling/Stan_experimental/normal_hierarchical_embed.stan" -->
<!-- mod <- cmdstan_model(file) -->
<!-- mod$check_syntax(pedantic = TRUE) -->
<!-- for (i in c(0.1,0.3,0.5,0.7)){ -->
<!-- datalist <- list(N = 100, -->
<!-- theta = rnorm(100,0,1), -->
<!-- psi = rnorm(100,0,1), -->
<!-- prior_only = 1, -->
<!-- gp_theta_mu_pr = 5, -->
<!-- gp_theta_sigma_pr = 1, -->
<!-- gp_sigma_mu_pr = 2, -->
<!-- gp_sigma_sigma_pr = 1, -->
<!-- gp_m_mu_pr = i, -->
<!-- gp_m_sigma_pr = 0.1, -->
<!-- psi_pr = 0.1 -->
<!-- ) -->
<!-- fit_gen <- mod$sample( -->
<!-- data = datalist, -->
<!-- seed = 123, -->
<!-- chains = 4, -->
<!-- parallel_chains = 4, -->
<!-- refresh = 100, -->
<!-- iter_warmup = 250, -->
<!-- iter_sampling = 1000 -->
<!-- ) -->
<!-- theta <- fit_gen %>% spread_draws(theta_pred[n]) %>% sample_draws(ndraws = 1, seed = 789) %>% select(!starts_with(".")) -->
<!-- psi <- fit_gen %>% spread_draws(psi_pred[n]) %>% sample_draws(ndraws = 1, seed = 789) %>% select(!starts_with(".")) -->
<!-- df <- left_join(theta,psi,by = "n") -->
<!-- assign(paste("p",i,sep = "_"), -->
<!-- ggplot(df, aes(x = theta_pred, y = psi_pred)) + -->
<!-- geom_point() + -->
<!-- geom_smooth(method = "lm", se = FALSE) + -->
<!-- ggtitle(paste("m set at",i)) -->
<!-- ) -->
<!-- } -->
<!-- p_0.1 + p_0.3 + p_0.5 + p_0.7 -->
<!-- ``` -->
<!-- ```{r} -->
<!-- file <- "~/FLARe/FLARe_Modelling/Stan_experimental/normal_hierarchical_embed.stan" -->
<!-- mod <- cmdstan_model(file) -->
<!-- mod$check_syntax(pedantic = TRUE) -->
<!-- datalist <- list(N = 100, -->
<!-- theta = rnorm(100,0,1), -->
<!-- psi = rnorm(100,0,1), -->
<!-- prior_only = 1, -->
<!-- gp_theta_mu_pr = 5, -->
<!-- gp_theta_sigma_pr = 1, -->
<!-- gp_sigma_mu_pr = 2, -->
<!-- gp_sigma_sigma_pr = 1, -->
<!-- gp_m_mu_pr = 0.3, -->
<!-- gp_m_sigma_pr = 0.1, -->
<!-- psi_pr = 0.1 -->
<!-- ) -->
<!-- fit_gen <- mod$sample( -->
<!-- data = datalist, -->
<!-- seed = 123, -->
<!-- chains = 4, -->
<!-- parallel_chains = 4, -->
<!-- refresh = 100, -->
<!-- iter_warmup = 250, -->
<!-- iter_sampling = 1000 -->
<!-- ) -->
<!-- # fit_psi$summary() -->
<!-- fit_gen %>% spread_draws(theta_pred[n]) %>% ggplot(aes(x = theta_pred)) + stat_dots(quantiles = 100) -->
<!-- fit_gen %>% spread_draws(psi_pred[n]) %>% ggplot(aes(x = psi_pred)) + stat_dots(quantiles = 100) -->
<!-- theta <- fit_gen %>% spread_draws(theta_pred[n]) %>% sample_draws(ndraws = 1, seed = 789) %>% pull(theta_pred) -->
<!-- hist(theta) -->
<!-- psi <- fit_gen %>% spread_draws(psi_pred[n]) %>% sample_draws(ndraws = 1, seed = 789) %>% pull(psi_pred) -->
<!-- hist(psi) -->
<!-- ``` -->
<!-- #### Recovery with same model -->
<!-- Parameter recovery from same model. The seed was changed, uninformative prior set. -->
<!-- ```{r} -->
<!-- file <- "~/FLARe/FLARe_Modelling/Stan_experimental/normal_hierarchical_embed.stan" -->
<!-- mod <- cmdstan_model(file) -->
<!-- mod$check_syntax(pedantic = TRUE) -->
<!-- datalist <- list(N = 100, -->
<!-- theta = theta, -->
<!-- psi = psi, -->
<!-- prior_only = 0, -->
<!-- gp_theta_mu_pr = 0, -->
<!-- gp_theta_sigma_pr = 1, -->
<!-- gp_sigma_mu_pr = 0, -->
<!-- gp_sigma_sigma_pr = 1, -->
<!-- gp_m_mu_pr = 0, -->
<!-- gp_m_sigma_pr = 1, -->
<!-- psi_pr = 0.1 -->
<!-- ) -->
<!-- fit_rec <- mod$sample( -->
<!-- data = datalist, -->
<!-- seed = 456, -->
<!-- chains = 4, -->
<!-- parallel_chains = 4, -->
<!-- refresh = 100, -->
<!-- iter_warmup = 250, -->
<!-- iter_sampling = 1000 -->
<!-- ) -->
<!-- # fit_rec$summary() -->
<!-- fit_rec %>% spread_draws(theta_pred[n]) %>% ggplot(aes(x = theta_pred)) + stat_dots(quantiles = 100, fill = "red", alpha = 0.5) + stat_dots(data = as.data.frame(theta), aes(x = theta), quantiles = 100, fill = "blue", alpha = 0.5, inherit.aes = TRUE) + ggtitle("PPC, Red = generated/estimated, Blue = real/ground truth") -->
<!-- fit_rec %>% spread_draws(psi_pred[n]) %>% ggplot(aes(x = psi_pred)) + stat_dots(quantiles = 100, fill = "red", alpha = 0.5) + stat_dots(data = as.data.frame(psi), aes(x = psi), quantiles = 100, fill = "blue", alpha = 0.5, inherit.aes = TRUE) + ggtitle("PPC, Red = generated/estimated, Blue = real/ground truth") -->
<!-- # Merge the two dfs to show real and estimated params side by side. -->
<!-- fit_rec %>% -->
<!-- gather_draws(gp_theta, gp_sigma, gp_m) %>% ungroup() %>% ggplot(aes(y = .variable, x = .value)) + stat_eye() + facet_wrap(~.variable, scales = "free_x") -->
<!-- fit_gen %>% gather_draws(gp_theta, gp_sigma, gp_m) %>% ggplot(aes(y = .variable, x = .value)) + stat_eye() + facet_wrap(~.variable, scales = "free_x") -->
<!-- merge_prior_post <- function(gen,rec,...){ -->
<!-- df1 <- gen %>% -->
<!-- gather_draws(...) %>% ungroup() %>% mutate(model = "ground truth") -->
<!-- df2 <- rec %>% -->
<!-- gather_draws(...) %>% ungroup() %>% mutate(model = "posterior") -->
<!-- df <- bind_rows(df1,df2) %>% mutate(model = as.factor(model)) -->
<!-- return(df) -->
<!-- } -->
<!-- merged_df <- merge_prior_post(fit_gen,fit_rec,gp_theta, gp_sigma, gp_m) -->
<!-- merged_df %>% -->
<!-- # filter(model == "prior") %>% -->
<!-- ggplot(aes(y = .variable, x = .value)) + -->
<!-- stat_dotsinterval(aes(group = model, -->
<!-- slab_colour = model, -->
<!-- slab_fill = model, -->
<!-- interval_colour = model, -->
<!-- point_colour = model)) + -->
<!-- facet_wrap(~.variable, scales = "free_x") -->
<!-- ``` -->
<!-- #### Recovery with standard model -->
<!-- Parameter recovery from standard, less complex model. The seed was changed, uninformative prior set. -->
<!-- ```{r} -->
<!-- file <- "~/FLARe/FLARe_Modelling/Stan_experimental/normal_hierarchical.stan" -->
<!-- mod <- cmdstan_model(file) -->
<!-- mod$check_syntax(pedantic = TRUE) -->
<!-- datalist <- list(N = 100, -->
<!-- theta = theta, -->
<!-- psi = psi, -->
<!-- prior_only = 0, -->
<!-- gp_theta_mu_pr = 0, -->
<!-- gp_theta_sigma_pr = 1, -->
<!-- gp_sigma_mu_pr = 0, -->
<!-- gp_sigma_sigma_pr = 1, -->
<!-- gp_m_mu_pr = 0, -->
<!-- gp_m_sigma_pr = 1, -->
<!-- psi_pr = 0.1 -->
<!-- ) -->
<!-- fit_rec_2 <- mod$sample( -->
<!-- data = datalist, -->
<!-- seed = 456, -->
<!-- chains = 4, -->
<!-- parallel_chains = 4, -->
<!-- refresh = 100, -->
<!-- iter_warmup = 250, -->
<!-- iter_sampling = 1000 -->
<!-- ) -->
<!-- fit_rec_2 %>% spread_draws(theta_pred[n]) %>% ggplot(aes(x = theta_pred)) + stat_dots(quantiles = 100, fill = "red", alpha = 0.5) + stat_dots(data = as.data.frame(theta), aes(x = theta), quantiles = 100, fill = "blue", alpha = 0.5, inherit.aes = TRUE) + ggtitle("PPC, Red = generated/estimated, Blue = real/ground truth") -->
<!-- # fit_rec %>% spread_draws(psi_pred[n]) %>% ggplot(aes(x = psi_pred)) + stat_dots(quantiles = 100, fill = "red", alpha = 0.5) + stat_dots(data = as.data.frame(psi), aes(x = psi), quantiles = 100, fill = "blue", alpha = 0.5, inherit.aes = TRUE) + ggtitle("PPC, Red = generated/estimated, Blue = real/ground truth") -->
<!-- # Merge the two dfs to show real and estimated params side by side. -->
<!-- fit_rec_2 %>% -->
<!-- gather_draws(gp_theta, gp_sigma) %>% ungroup() %>% ggplot(aes(y = .variable, x = .value)) + stat_eye() + facet_wrap(~.variable, scales = "free_x") -->
<!-- fit_gen %>% gather_draws(gp_theta, gp_sigma, gp_m) %>% ggplot(aes(y = .variable, x = .value)) + stat_eye() + facet_wrap(~.variable, scales = "free_x") -->
<!-- merge_prior_post <- function(gen,rec,...){ -->
<!-- df1 <- gen %>% -->
<!-- gather_draws(...) %>% ungroup() %>% mutate(model = "ground truth") -->
<!-- df2 <- rec %>% -->
<!-- gather_draws(...) %>% ungroup() %>% mutate(model = "posterior") -->
<!-- df <- bind_rows(df1,df2) %>% mutate(model = as.factor(model)) -->
<!-- return(df) -->
<!-- } -->
<!-- merged_df <- merge_prior_post(fit_gen,fit_rec_2,gp_theta, gp_sigma) -->
<!-- merged_df %>% -->
<!-- # filter(model == "prior") %>% -->
<!-- ggplot(aes(y = .variable, x = .value)) + -->
<!-- stat_dotsinterval(aes(group = model, -->
<!-- slab_colour = model, -->
<!-- slab_fill = model, -->
<!-- interval_colour = model, -->
<!-- point_colour = model)) + -->
<!-- facet_wrap(~.variable, scales = "free_x") -->
<!-- ``` -->
<!-- ```{r} -->
<!-- fit_rec$summary(variables = "log_lik") -->
<!-- loo_same <- fit_rec$loo(r_eff = TRUE) -->
<!-- loo_standard <- fit_rec_2$loo(r_eff = TRUE) -->
<!-- comp <- loo_compare(loo_same,loo_standard) -->
<!-- rownames(comp) <- c("Same model","Standard model") -->
<!-- print(comp, simplify = FALSE, digits = 3) -->
<!-- ``` -->
<!-- ## Will this work with phi approx? -->
<!-- Well first the model was expanded to incorporate data, as it failed to make sense without it. And the previous models were effectively using individual thetas as data. -->
<!-- A simple bernoulli trial was introduced, where theta was the probability of success. -->
<!-- $$ y \sim \text{Bernoulli}(\theta)$$ -->
<!-- Phi approx is used to put theta in the unit -->
<!-- $$\theta = \text{PhiApprox}(\mu + \sigma * individual variance)$$ -->
<!-- ```{r} -->
<!-- file <- "~/FLARe/FLARe_Modelling/Stan_experimental/normal_hierarchical_phi.stan" -->
<!-- mod <- cmdstan_model(file) -->
<!-- mod$check_syntax(pedantic = TRUE) -->
<!-- datalist <- list(N = 100, -->
<!-- T = 10, -->
<!-- # theta = rnorm(100,0,1), -->
<!-- # psi = rnorm(100,0,1), -->
<!-- y = matrix(data = 0, nrow = 100, ncol = 10), -->
<!-- prior_only = 1, -->
<!-- gp_theta_mu_pr = 0.3, -->
<!-- gp_theta_sigma_pr = 0.1, -->
<!-- gp_sigma_mu_pr = 0.1, -->
<!-- gp_sigma_sigma_pr = 0.1, -->
<!-- ind_scale_mu_pr = 0, -->
<!-- ind_scale_sigma_pr = 0.1 -->
<!-- # gp_m_mu_pr = 0.3, -->
<!-- # gp_m_sigma_pr = 0.1, -->
<!-- # psi_pr = 0.1 -->
<!-- ) -->
<!-- fit_gen <- mod$sample( -->
<!-- data = datalist, -->
<!-- seed = 123, -->
<!-- chains = 4, -->
<!-- parallel_chains = 4, -->
<!-- refresh = 100, -->
<!-- iter_warmup = 250, -->
<!-- iter_sampling = 1000, -->
<!-- # fixed_param = TRUE -->
<!-- ) -->
<!-- # fit_psi$summary() -->
<!-- fit_gen %>% spread_draws(theta[n]) %>% ggplot(aes(x = theta)) + stat_dots(quantiles = 100) -->
<!-- # fit_gen %>% spread_draws(psi_pred[n]) %>% ggplot(aes(x = psi_pred)) + stat_dots(quantiles = 100) -->
<!-- theta <- fit_gen %>% spread_draws(theta[n]) %>% sample_draws(ndraws = 1, seed = 789) %>% pull(theta) -->
<!-- hist(theta) -->
<!-- # psi <- fit_gen %>% spread_draws(psi_pred[n]) %>% sample_draws(ndraws = 1, seed = 789) %>% pull(psi_pred) -->
<!-- # hist(psi) -->
<!-- fit_gen$summary() -->
<!-- y <- fit_gen %>% spread_draws(y_pred[n,t]) %>% sample_draws(ndraws = 1, seed = 789) %>% pull(y_pred) %>% matrix(nrow = 10, ncol =100) %>% t() -->
<!-- hist(y) -->
<!-- ``` -->
<!-- #### Recovery -->
<!-- ```{r} -->
<!-- file <- "~/FLARe/FLARe_Modelling/Stan_experimental/normal_hierarchical_phi.stan" -->
<!-- mod <- cmdstan_model(file) -->
<!-- mod$check_syntax(pedantic = TRUE) -->
<!-- datalist <- list(N = 100, -->
<!-- T = 10, -->
<!-- # theta = rnorm(100,0,1), -->
<!-- # psi = rnorm(100,0,1), -->
<!-- y = y, -->
<!-- prior_only = 0, -->
<!-- gp_theta_mu_pr = 0, -->
<!-- gp_theta_sigma_pr = 1, -->
<!-- gp_sigma_mu_pr = 0, -->
<!-- gp_sigma_sigma_pr = 1, -->
<!-- ind_scale_mu_pr = 0, -->
<!-- ind_scale_sigma_pr = 1 -->
<!-- # gp_m_mu_pr = 0.3, -->
<!-- # gp_m_sigma_pr = 0.1, -->
<!-- # psi_pr = 0.1 -->
<!-- ) -->
<!-- fit_rec <- mod$sample( -->
<!-- data = datalist, -->
<!-- seed = 123, -->
<!-- chains = 4, -->
<!-- parallel_chains = 4, -->
<!-- refresh = 100, -->
<!-- iter_warmup = 250, -->
<!-- iter_sampling = 1000, -->
<!-- # fixed_param = TRUE -->
<!-- ) -->
<!-- # fit_rec$summary() -->
<!-- fit_rec %>% spread_draws(theta[n]) %>% ggplot(aes(x = theta)) + stat_dots(quantiles = 100, fill = "red", alpha = 0.5) + stat_dots(data = as.data.frame(theta), aes(x = theta), quantiles = 100, fill = "blue", alpha = 0.5, inherit.aes = TRUE) + ggtitle("PPC, Red = generated/estimated, Blue = real/ground truth") -->
<!-- fit_rec %>% spread_draws(y_pred[n,t]) %>% sample_draws(ndraws = 1, seed = 789) %>% ggplot(aes(x = y_pred)) + stat_dots(quantiles = 100, fill = "red", alpha = 0.5) + stat_dots(data = as.data.frame(as.vector(y)), aes(x = `as.vector(y)`+0.1), quantiles = 100, fill = "blue", alpha = 0.5, inherit.aes = TRUE) + ggtitle("PPC Y, Red = generated/estimated, Blue = real/ground truth") -->
<!-- # fit_rec %>% spread_draws(psi_pred[n]) %>% ggplot(aes(x = psi_pred)) + stat_dots(quantiles = 100, fill = "red", alpha = 0.5) + stat_dots(data = as.data.frame(psi), aes(x = psi), quantiles = 100, fill = "blue", alpha = 0.5, inherit.aes = TRUE) + ggtitle("PPC, Red = generated/estimated, Blue = real/ground truth") -->
<!-- # Merge the two dfs to show real and estimated params side by side. -->
<!-- # bayesplot::mcmc_trace(fit_rec$draws(), pars = "gp_sigma") -->
<!-- fit_rec %>% -->
<!-- gather_draws(gp_theta, gp_sigma) %>% ungroup() %>% ggplot(aes(y = .variable, x = .value)) + stat_eye() + facet_wrap(~.variable, scales = "free_x") -->
<!-- fit_gen %>% gather_draws(gp_theta, gp_sigma) %>% ggplot(aes(y = .variable, x = .value)) + stat_eye() + facet_wrap(~.variable, scales = "free_x") -->
<!-- merge_prior_post <- function(gen,rec,...){ -->
<!-- df1 <- gen %>% -->
<!-- gather_draws(...) %>% ungroup() %>% mutate(model = "ground truth") -->
<!-- df2 <- rec %>% -->
<!-- gather_draws(...) %>% ungroup() %>% mutate(model = "posterior") -->
<!-- df <- bind_rows(df1,df2) %>% mutate(model = as.factor(model)) -->
<!-- return(df) -->
<!-- } -->
<!-- merged_df <- merge_prior_post(fit_gen,fit_rec,gp_theta, gp_sigma) -->
<!-- merged_df %>% -->
<!-- # filter(model == "prior") %>% -->
<!-- ggplot(aes(y = .variable, x = .value)) + -->
<!-- stat_dotsinterval(aes(group = model, -->
<!-- slab_colour = model, -->
<!-- slab_fill = model, -->
<!-- interval_colour = model, -->
<!-- point_colour = model)) + -->
<!-- facet_wrap(~.variable, scales = "free_x") -->
<!-- # fit_rec$loo(r_eff= TRUE) -->
<!-- # fit_psi$summary() -->
<!-- # fit_gen %>% spread_draws(theta[n]) %>% ggplot(aes(x = theta)) + stat_dots(quantiles = 100) -->
<!-- # -->
<!-- # # fit_gen %>% spread_draws(psi_pred[n]) %>% ggplot(aes(x = psi_pred)) + stat_dots(quantiles = 100) -->
<!-- # -->
<!-- # theta <- fit_gen %>% spread_draws(theta[n]) %>% sample_draws(ndraws = 1, seed = 789) %>% pull(theta) -->
<!-- # -->
<!-- # hist(theta) -->
<!-- # -->
<!-- # # psi <- fit_gen %>% spread_draws(psi_pred[n]) %>% sample_draws(ndraws = 1, seed = 789) %>% pull(psi_pred) -->
<!-- # -->
<!-- # # hist(psi) -->
<!-- # -->
<!-- # fit_gen$summary() -->
<!-- # fit_rec$summary() -->
<!-- # -->
<!-- # y <- fit_gen %>% spread_draws(y_pred[n,t]) %>% sample_draws(ndraws = 1, seed = 789) %>% pull(y_pred) %>% matrix(nrow = 10, ncol =100) %>% t() -->
<!-- ``` -->
<!-- The model recovers OK. The phi_approx shifts/transforms the parameter (from a group mean of 0.3 to an actual theta mean of 0.6 or so) -->
<!-- A model was developed as such: -->
<!-- $$\theta = \text{PhiApprox}(\mu + \sigma * individual variance + m\psi)$$ -->
<!-- ```{r} -->
<!-- file <- "~/FLARe/FLARe_Modelling/Stan_experimental/normal_hierarchical_embed_phi.stan" -->
<!-- mod <- cmdstan_model(file) -->
<!-- mod$check_syntax(pedantic = TRUE) -->
<!-- for (i in c(0.1,0.3,0.5,0.7,0.9)){ -->
<!-- datalist <- list(N = 100, -->
<!-- T = 10, -->
<!-- # theta = rnorm(100,0,1), -->
<!-- # psi = rnorm(100,0,1), -->
<!-- y = matrix(data = 0, nrow = 100, ncol = 10), -->
<!-- prior_only = 1, -->
<!-- gp_theta_mu_pr = 0.3, -->
<!-- gp_theta_sigma_pr = 0.1, -->
<!-- gp_sigma_mu_pr = 0.1, -->
<!-- gp_sigma_sigma_pr = 0.1, -->
<!-- ind_scale_mu_pr = 0, -->
<!-- ind_scale_sigma_pr = 0.1, -->
<!-- gp_m_mu_pr = i, -->
<!-- gp_m_sigma_pr = 0.01, -->
<!-- psi_pr = 1, -->
<!-- psi_y = rep(0,100) -->
<!-- ) -->
<!-- fit_gen <- mod$sample( -->
<!-- data = datalist, -->
<!-- seed = 123, -->
<!-- chains = 4, -->
<!-- parallel_chains = 4, -->
<!-- refresh = 100, -->
<!-- iter_warmup = 250, -->
<!-- iter_sampling = 1000, -->
<!-- # fixed_param = TRUE -->
<!-- ) -->
<!-- y_tot <- fit_gen %>% spread_draws(y_pred[n,t]) %>% sample_draws(ndraws = 100, seed = 789) %>% ungroup(t) %>% summarise(total = sum(y_pred)/100) -->
<!-- theta_med <- fit_gen %>% spread_draws(theta[n]) %>% summarise(med = median(theta)) %>% pull(med) %>% median() -->
<!-- plt <- fit_gen %>% spread_draws(psi_pred[n]) %>% sample_draws(ndraws = 1, seed = 789) %>% select(n,psi_pred) %>% left_join(y_tot, by = "n") %>% ggplot(aes(x = psi_pred, y = total)) + -->
<!-- geom_point() + -->
<!-- geom_smooth(method = "lm") + -->
<!-- ggtitle(paste("m set at",i,"y mean is:",mean(y_tot$total),"theta median is:",theta_med)) -->
<!-- assign(paste("p",i,sep = "_"),plt) -->
<!-- } -->
<!-- p_0.1 + p_0.3 + p_0.5 + p_0.7 + p_0.9 -->
<!-- ``` -->
<!-- ### Recovery -->
<!-- ```{r} -->
<!-- file <- "~/FLARe/FLARe_Modelling/Stan_experimental/normal_hierarchical_embed_phi.stan" -->
<!-- mod <- cmdstan_model(file) -->
<!-- mod$check_syntax(pedantic = TRUE) -->
<!-- t_mult <- 1 -->
<!-- datalist <- list(N = 100, -->
<!-- T = 10*t_mult, -->
<!-- # theta = rnorm(100,0,1), -->
<!-- # psi = rnorm(100,0,1), -->
<!-- y = matrix(data = 0, nrow = 100, ncol = 10*t_mult), -->
<!-- prior_only = 1, -->
<!-- gp_theta_mu_pr = 0.3, -->
<!-- gp_theta_sigma_pr = 0.1, -->
<!-- gp_sigma_mu_pr = 0.1, -->
<!-- gp_sigma_sigma_pr = 0.1, -->
<!-- ind_scale_mu_pr = 0, -->
<!-- ind_scale_sigma_pr = 0.1, -->
<!-- gp_m_mu_pr = 0.3, -->
<!-- gp_m_sigma_pr = 0.01, -->
<!-- psi_pr = 1, -->
<!-- psi_y = rep(0,100) -->
<!-- ) -->
<!-- fit_gen <- mod$sample( -->
<!-- data = datalist, -->
<!-- seed = 123, -->
<!-- chains = 4, -->
<!-- parallel_chains = 4, -->
<!-- refresh = 100, -->
<!-- iter_warmup = 250, -->
<!-- iter_sampling = 1000, -->
<!-- # fixed_param = TRUE -->
<!-- ) -->
<!-- fit_gen %>% spread_draws(theta[n]) %>% ggplot(aes(x = theta)) + stat_dots(quantiles = 100) -->
<!-- fit_gen %>% spread_draws(psi_pred[n]) %>% ggplot(aes(x = psi_pred)) + stat_dots(quantiles = 100) -->
<!-- theta <- fit_gen %>% spread_draws(theta[n]) %>% sample_draws(ndraws = 1, seed = 789) %>% pull(theta) -->
<!-- hist(theta) -->
<!-- psi <- fit_gen %>% spread_draws(psi_pred[n]) %>% sample_draws(ndraws = 1, seed = 789) %>% pull(psi_pred) -->
<!-- hist(psi) -->
<!-- # fit_gen$summary() -->
<!-- y <- fit_gen %>% spread_draws(y_pred[n,t]) %>% sample_draws(ndraws = 1, seed = 789) %>% pull(y_pred) %>% matrix(nrow = 10*t_mult, ncol =100) %>% t() -->
<!-- # fit_gen %>% spread_draws(psi_pred[n]) %>% sample_draws(ndraws = 1, seed = 789) %>% select(n,psi_pred) %>% left_join(y_tot, by = "n") %>% ggplot(aes(x = psi_pred, y = total)) + geom_point() + geom_smooth(method = "lm") -->
<!-- # -->
<!-- # pull(y_pred) %>% matrix(nrow = 10, ncol =100) %>% t() -->
<!-- # -->
<!-- # hist(y_tot$total) -->
<!-- datalist <- list(N = 100, -->
<!-- T = 10*t_mult, -->
<!-- # theta = rnorm(100,0,1), -->
<!-- # psi = rnorm(100,0,1), -->
<!-- y = y, -->
<!-- prior_only = 0, -->
<!-- gp_theta_mu_pr = 0, -->
<!-- gp_theta_sigma_pr = 1, -->
<!-- gp_sigma_mu_pr = 0, -->
<!-- gp_sigma_sigma_pr = 1, -->
<!-- ind_scale_mu_pr = 0, -->
<!-- ind_scale_sigma_pr = 1, -->
<!-- gp_m_mu_pr = 0.3, -->
<!-- gp_m_sigma_pr = 1, -->
<!-- psi_pr = 1, -->
<!-- psi_y = psi -->
<!-- ) -->
<!-- fit_rec <- mod$sample( -->
<!-- data = datalist, -->
<!-- seed = 123, -->
<!-- chains = 4, -->
<!-- parallel_chains = 4, -->
<!-- refresh = 100, -->
<!-- iter_warmup = 250, -->
<!-- iter_sampling = 1000, -->
<!-- # fixed_param = TRUE -->
<!-- ) -->
<!-- fit_rec$summary() -->
<!-- fit_rec %>% spread_draws(gp_theta) %>% mcmc_trace() -->
<!-- fit_rec %>% spread_draws(theta[n]) %>% ggplot(aes(x = theta)) + stat_dots(quantiles = 100, fill = "red", alpha = 0.5) + stat_dots(data = as.data.frame(theta), aes(x = theta), quantiles = 100, fill = "blue", alpha = 0.5, inherit.aes = TRUE) + ggtitle("PPC, Red = generated/estimated, Blue = real/ground truth") -->
<!-- fit_rec %>% spread_draws(y_pred[n,t]) %>% sample_draws(ndraws = 1, seed = 789) %>% ggplot(aes(x = y_pred)) + stat_dots(quantiles = 100, fill = "red", alpha = 0.5) + stat_dots(data = as.data.frame(as.vector(y)), aes(x = `as.vector(y)`+0.1), quantiles = 100, fill = "blue", alpha = 0.5, inherit.aes = TRUE) + ggtitle("PPC Y, Red = generated/estimated, Blue = real/ground truth") -->
<!-- # fit_rec %>% spread_draws(psi_pred[n]) %>% ggplot(aes(x = psi_pred)) + stat_dots(quantiles = 100, fill = "red", alpha = 0.5) + stat_dots(data = as.data.frame(psi), aes(x = psi), quantiles = 100, fill = "blue", alpha = 0.5, inherit.aes = TRUE) + ggtitle("PPC, Red = generated/estimated, Blue = real/ground truth") -->
<!-- # Merge the two dfs to show real and estimated params side by side. -->
<!-- # bayesplot::mcmc_trace(fit_rec$draws(), pars = "gp_sigma") -->
<!-- fit_rec %>% -->
<!-- gather_draws(gp_theta, gp_sigma) %>% ungroup() %>% ggplot(aes(y = .variable, x = .value)) + stat_eye() + facet_wrap(~.variable, scales = "free_x") -->
<!-- fit_gen %>% gather_draws(gp_theta, gp_sigma) %>% ggplot(aes(y = .variable, x = .value)) + stat_eye() + facet_wrap(~.variable, scales = "free_x") -->
<!-- merge_prior_post <- function(gen,rec,...){ -->
<!-- df1 <- gen %>% -->
<!-- gather_draws(...) %>% ungroup() %>% mutate(model = "ground truth") -->
<!-- df2 <- rec %>% -->
<!-- gather_draws(...) %>% ungroup() %>% mutate(model = "posterior") -->
<!-- df <- bind_rows(df1,df2) %>% mutate(model = as.factor(model)) -->
<!-- return(df) -->
<!-- } -->
<!-- merged_df <- merge_prior_post(fit_gen,fit_rec,gp_theta, gp_sigma, gp_m) -->
<!-- merged_df %>% -->
<!-- # filter(model == "prior") %>% -->
<!-- ggplot(aes(y = .variable, x = .value)) + -->
<!-- stat_dotsinterval(aes(group = model, -->
<!-- slab_colour = model, -->
<!-- slab_fill = model, -->
<!-- interval_colour = model, -->
<!-- point_colour = model)) + -->
<!-- facet_wrap(~.variable, scales = "free_x") -->
<!-- ``` -->
<!-- This model failed to recover the parameters, but does reproduce the data effectively. The data may be too simplistic to insist on the complex distribution shape of theta. -->
<!-- This model uses an exponential distribution for psi. The result is always a positive number, which then assumes the impact of the psi score will be in one direction (positive, or adding to theta). In reality, it should have an effect in either direction, i.e. a score of 2 in GAD would be low anxiety and should have an effect in a different direction to a score of 15. Exactly where the cutoff is is fuzzy, which might need estimating... -->
<!-- It may be better to reshape the distribution around 0, allowing effects in either direction. -->
<!-- ```{r} -->
<!-- file <- "~/FLARe/FLARe_Modelling/Stan_experimental/normal_hierarchical_embed_phi_st.stan" -->
<!-- mod <- cmdstan_model(file) -->
<!-- mod$check_syntax(pedantic = TRUE) -->
<!-- t_mult <- 1 -->
<!-- datalist <- list(N = 100, -->
<!-- T = 10*t_mult, -->
<!-- # theta = rnorm(100,0,1), -->
<!-- # psi = rnorm(100,0,1), -->
<!-- y = matrix(data = 0, nrow = 100, ncol = 10*t_mult), -->
<!-- prior_only = 1, -->
<!-- gp_theta_mu_pr = 0.3, -->
<!-- gp_theta_sigma_pr = 0.1, -->
<!-- gp_sigma_mu_pr = 0.1, -->
<!-- gp_sigma_sigma_pr = 0.1, -->
<!-- ind_scale_mu_pr = 0, -->
<!-- ind_scale_sigma_pr = 0.1, -->
<!-- gp_m_mu_pr = 0.3, -->
<!-- gp_m_sigma_pr = 0.01, -->
<!-- psi_pr = 1, -->
<!-- psi_y = rep(0,100) -->
<!-- ) -->
<!-- fit_gen <- mod$sample( -->
<!-- data = datalist, -->
<!-- seed = 123, -->
<!-- chains = 4, -->
<!-- parallel_chains = 4, -->
<!-- refresh = 100, -->
<!-- iter_warmup = 250, -->
<!-- iter_sampling = 1000, -->
<!-- # fixed_param = TRUE -->
<!-- ) -->
<!-- fit_gen %>% spread_draws(theta[n]) %>% ggplot(aes(x = theta)) + stat_dots(quantiles = 100) -->
<!-- fit_gen %>% spread_draws(psi_pred[n]) %>% ggplot(aes(x = psi_pred)) + stat_dots(quantiles = 100) -->
<!-- theta <- fit_gen %>% spread_draws(theta[n]) %>% sample_draws(ndraws = 1, seed = 789) %>% pull(theta) -->
<!-- hist(theta) -->
<!-- psi <- fit_gen %>% spread_draws(psi_pred[n]) %>% sample_draws(ndraws = 1, seed = 789) %>% pull(psi_pred) -->
<!-- hist(psi) -->
<!-- # fit_gen$summary() -->
<!-- y <- fit_gen %>% spread_draws(y_pred[n,t]) %>% sample_draws(ndraws = 1, seed = 789) %>% pull(y_pred) %>% matrix(nrow = 10*t_mult, ncol =100) %>% t() -->
<!-- # fit_gen %>% spread_draws(psi_pred[n]) %>% sample_draws(ndraws = 1, seed = 789) %>% select(n,psi_pred) %>% left_join(y_tot, by = "n") %>% ggplot(aes(x = psi_pred, y = total)) + geom_point() + geom_smooth(method = "lm") -->
<!-- # -->
<!-- # pull(y_pred) %>% matrix(nrow = 10, ncol =100) %>% t() -->
<!-- # -->
<!-- # hist(y_tot$total) -->
<!-- datalist <- list(N = 100, -->
<!-- T = 10*t_mult, -->
<!-- # theta = rnorm(100,0,1), -->
<!-- # psi = rnorm(100,0,1), -->
<!-- y = y, -->
<!-- prior_only = 0, -->
<!-- gp_theta_mu_pr = 0, -->
<!-- gp_theta_sigma_pr = 1, -->
<!-- gp_sigma_mu_pr = 0, -->
<!-- gp_sigma_sigma_pr = 1, -->
<!-- ind_scale_mu_pr = 0, -->
<!-- ind_scale_sigma_pr = 1, -->
<!-- gp_m_mu_pr = 0.3, -->
<!-- gp_m_sigma_pr = 1, -->
<!-- psi_pr = 1, -->
<!-- psi_y = psi -->
<!-- ) -->
<!-- fit_rec <- mod$sample( -->
<!-- data = datalist, -->
<!-- seed = 123, -->
<!-- chains = 4, -->
<!-- parallel_chains = 4, -->
<!-- refresh = 100, -->
<!-- iter_warmup = 250, -->
<!-- iter_sampling = 1000, -->
<!-- # fixed_param = TRUE -->
<!-- ) -->
<!-- fit_rec$summary() -->
<!-- fit_rec %>% spread_draws(gp_theta) %>% mcmc_trace() -->
<!-- fit_rec %>% spread_draws(theta[n]) %>% ggplot(aes(x = theta)) + stat_dots(quantiles = 100, fill = "red", alpha = 0.5) + stat_dots(data = as.data.frame(theta), aes(x = theta), quantiles = 100, fill = "blue", alpha = 0.5, inherit.aes = TRUE) + ggtitle("PPC, Red = generated/estimated, Blue = real/ground truth") -->
<!-- fit_rec %>% spread_draws(y_pred[n,t]) %>% sample_draws(ndraws = 1, seed = 789) %>% ggplot(aes(x = y_pred)) + stat_dots(quantiles = 100, fill = "red", alpha = 0.5) + stat_dots(data = as.data.frame(as.vector(y)), aes(x = `as.vector(y)`+0.1), quantiles = 100, fill = "blue", alpha = 0.5, inherit.aes = TRUE) + ggtitle("PPC Y, Red = generated/estimated, Blue = real/ground truth") -->
<!-- # fit_rec %>% spread_draws(psi_pred[n]) %>% ggplot(aes(x = psi_pred)) + stat_dots(quantiles = 100, fill = "red", alpha = 0.5) + stat_dots(data = as.data.frame(psi), aes(x = psi), quantiles = 100, fill = "blue", alpha = 0.5, inherit.aes = TRUE) + ggtitle("PPC, Red = generated/estimated, Blue = real/ground truth") -->
<!-- # Merge the two dfs to show real and estimated params side by side. -->
<!-- # bayesplot::mcmc_trace(fit_rec$draws(), pars = "gp_sigma") -->
<!-- fit_rec %>% -->
<!-- gather_draws(gp_theta, gp_sigma) %>% ungroup() %>% ggplot(aes(y = .variable, x = .value)) + stat_eye() + facet_wrap(~.variable, scales = "free_x") -->
<!-- fit_gen %>% gather_draws(gp_theta, gp_sigma) %>% ggplot(aes(y = .variable, x = .value)) + stat_eye() + facet_wrap(~.variable, scales = "free_x") -->
<!-- merge_prior_post <- function(gen,rec,...){ -->
<!-- df1 <- gen %>% -->
<!-- gather_draws(...) %>% ungroup() %>% mutate(model = "ground truth") -->
<!-- df2 <- rec %>% -->
<!-- gather_draws(...) %>% ungroup() %>% mutate(model = "posterior") -->
<!-- df <- bind_rows(df1,df2) %>% mutate(model = as.factor(model)) -->
<!-- return(df) -->
<!-- } -->
<!-- merged_df <- merge_prior_post(fit_gen,fit_rec,gp_theta, gp_sigma,gp_m) -->
<!-- merged_df %>% -->
<!-- # filter(model == "prior") %>% -->
<!-- ggplot(aes(y = .variable, x = .value)) + -->
<!-- stat_dotsinterval(aes(group = model, -->
<!-- slab_colour = model, -->
<!-- slab_fill = model, -->
<!-- interval_colour = model, -->
<!-- point_colour = model)) + -->
<!-- facet_wrap(~.variable, scales = "free_x") -->
<!-- ``` -->
<!-- This appears to have the effect of underestimating most parameters. -->
<!-- The data generated was made more complex by using a normal_rng instead of a bernouilli, to see if the extra parameters would offer use. -->
<!-- ```{r} -->
<!-- file <- "~/FLARe/FLARe_Modelling/Stan_experimental/normal_hierarchical_embed_phi_st_normrng.stan" -->
<!-- mod <- cmdstan_model(file) -->
<!-- mod$check_syntax(pedantic = TRUE) -->
<!-- t_mult <- 1 -->
<!-- datalist <- list(N = 100, -->
<!-- T = 10*t_mult, -->
<!-- # theta = rnorm(100,0,1), -->
<!-- # psi = rnorm(100,0,1), -->
<!-- y = matrix(data = 0, nrow = 100, ncol = 10*t_mult), -->
<!-- prior_only = 1, -->
<!-- gp_theta_mu_pr = 0.3, -->
<!-- gp_theta_sigma_pr = 0.1, -->
<!-- gp_sigma_mu_pr = 0.1, -->
<!-- gp_sigma_sigma_pr = 0.1, -->
<!-- ind_scale_mu_pr = 0, -->
<!-- ind_scale_sigma_pr = 0.1, -->
<!-- gp_m_mu_pr = 0.3, -->
<!-- gp_m_sigma_pr = 0.01, -->
<!-- psi_pr = 1, -->
<!-- psi_y = rep(0,100) -->
<!-- ) -->
<!-- fit_gen <- mod$sample( -->
<!-- data = datalist, -->
<!-- seed = 123, -->
<!-- chains = 4, -->
<!-- parallel_chains = 4, -->
<!-- refresh = 100, -->
<!-- iter_warmup = 250, -->
<!-- iter_sampling = 1000, -->
<!-- # fixed_param = TRUE -->
<!-- ) -->
<!-- fit_gen %>% spread_draws(theta[n]) %>% ggplot(aes(x = theta)) + stat_dots(quantiles = 100) -->
<!-- fit_gen %>% spread_draws(gp_theta_mu) %>% ggplot(aes(x = gp_theta_mu)) + stat_dots(quantiles = 100) -->
<!-- fit_gen %>% spread_draws(psi_pred[n]) %>% ggplot(aes(x = psi_pred)) + stat_dots(quantiles = 100) -->
<!-- theta <- fit_gen %>% spread_draws(theta[n]) %>% sample_draws(ndraws = 1, seed = 789) %>% pull(theta) -->
<!-- hist(theta) -->
<!-- psi <- fit_gen %>% spread_draws(psi_pred[n]) %>% sample_draws(ndraws = 1, seed = 789) %>% pull(psi_pred) -->
<!-- hist(psi) -->
<!-- # fit_gen$summary() -->
<!-- y <- fit_gen %>% spread_draws(y_pred[n,t]) %>% sample_draws(ndraws = 1, seed = 789) %>% pull(y_pred) %>% matrix(nrow = 10*t_mult, ncol =100) %>% t() -->
<!-- hist(y) -->
<!-- # fit_gen %>% spread_draws(psi_pred[n]) %>% sample_draws(ndraws = 1, seed = 789) %>% select(n,psi_pred) %>% left_join(y_tot, by = "n") %>% ggplot(aes(x = psi_pred, y = total)) + geom_point() + geom_smooth(method = "lm") -->
<!-- # -->
<!-- # pull(y_pred) %>% matrix(nrow = 10, ncol =100) %>% t() -->
<!-- # -->
<!-- # hist(y_tot$total) -->
<!-- datalist <- list(N = 100, -->
<!-- T = 10*t_mult, -->
<!-- # theta = rnorm(100,0,1), -->
<!-- # psi = rnorm(100,0,1), -->
<!-- y = y, -->
<!-- prior_only = 0, -->
<!-- gp_theta_mu_pr = 0, -->
<!-- gp_theta_sigma_pr = 1, -->
<!-- gp_sigma_mu_pr = 0, -->
<!-- gp_sigma_sigma_pr = 1, -->
<!-- ind_scale_mu_pr = 0, -->
<!-- ind_scale_sigma_pr = 1, -->
<!-- gp_m_mu_pr = 0.3, -->
<!-- gp_m_sigma_pr = 1, -->
<!-- psi_pr = 1, -->
<!-- psi_y = psi -->
<!-- ) -->
<!-- fit_rec <- mod$sample( -->
<!-- data = datalist, -->
<!-- seed = 123, -->
<!-- chains = 4, -->
<!-- parallel_chains = 4, -->
<!-- refresh = 100, -->
<!-- iter_warmup = 250, -->
<!-- iter_sampling = 1000, -->
<!-- # fixed_param = TRUE -->
<!-- ) -->
<!-- fit_rec$summary() -->
<!-- fit_rec %>% spread_draws(gp_theta) %>% mcmc_trace() -->
<!-- fit_rec %>% spread_draws(theta[n]) %>% ggplot(aes(x = theta)) + stat_dots(quantiles = 100, fill = "red", alpha = 0.5) + stat_dots(data = as.data.frame(theta), aes(x = theta), quantiles = 100, fill = "blue", alpha = 0.5, inherit.aes = TRUE) + ggtitle("PPC, Red = generated/estimated, Blue = real/ground truth") -->
<!-- fit_rec %>% spread_draws(y_pred[n,t]) %>% sample_draws(ndraws = 1, seed = 789) %>% ggplot(aes(x = y_pred)) + stat_dots(quantiles = 100, fill = "red", alpha = 0.5) + stat_dots(data = as.data.frame(as.vector(y)), aes(x = `as.vector(y)`+0.1), quantiles = 100, fill = "blue", alpha = 0.5, inherit.aes = TRUE) + ggtitle("PPC Y, Red = generated/estimated, Blue = real/ground truth") -->
<!-- # fit_rec %>% spread_draws(psi_pred[n]) %>% ggplot(aes(x = psi_pred)) + stat_dots(quantiles = 100, fill = "red", alpha = 0.5) + stat_dots(data = as.data.frame(psi), aes(x = psi), quantiles = 100, fill = "blue", alpha = 0.5, inherit.aes = TRUE) + ggtitle("PPC, Red = generated/estimated, Blue = real/ground truth") -->
<!-- # Merge the two dfs to show real and estimated params side by side. -->
<!-- # bayesplot::mcmc_trace(fit_rec$draws(), pars = "gp_sigma") -->
<!-- fit_rec %>% -->
<!-- gather_draws(gp_theta, gp_sigma) %>% ungroup() %>% ggplot(aes(y = .variable, x = .value)) + stat_eye() + facet_wrap(~.variable, scales = "free_x") -->
<!-- fit_gen %>% gather_draws(gp_theta, gp_sigma) %>% ggplot(aes(y = .variable, x = .value)) + stat_eye() + facet_wrap(~.variable, scales = "free_x") -->
<!-- merge_prior_post <- function(gen,rec,...){ -->
<!-- df1 <- gen %>% -->
<!-- gather_draws(...) %>% ungroup() %>% mutate(model = "ground truth") -->
<!-- df2 <- rec %>% -->
<!-- gather_draws(...) %>% ungroup() %>% mutate(model = "posterior") -->
<!-- df <- bind_rows(df1,df2) %>% mutate(model = as.factor(model)) -->
<!-- return(df) -->
<!-- } -->
<!-- merged_df <- merge_prior_post(fit_gen,fit_rec,gp_theta, gp_sigma,gp_m) -->
<!-- merged_df %>% -->
<!-- # filter(model == "prior") %>% -->
<!-- ggplot(aes(y = .variable, x = .value)) + -->
<!-- stat_dotsinterval(aes(group = model, -->
<!-- slab_colour = model, -->
<!-- slab_fill = model, -->
<!-- interval_colour = model, -->
<!-- point_colour = model)) + -->
<!-- facet_wrap(~.variable, scales = "free_x") -->
<!-- ``` -->
<!-- Still of the mark. Going back to inbuilt functions instead of transforming and psi_approx -->
<!-- ```{r} -->
<!-- file <- "~/FLARe/FLARe_Modelling/Stan_experimental/normal_hierarchical_embed_phi_st_normrng_normlpdf.stan" -->
<!-- mod <- cmdstan_model(file) -->
<!-- mod$check_syntax(pedantic = TRUE) -->
<!-- t_mult <- 1 -->
<!-- datalist <- list(N = 100, -->
<!-- T = 10*t_mult, -->
<!-- # theta = rnorm(100,0,1), -->
<!-- # psi = rnorm(100,0,1), -->
<!-- y = matrix(data = 0, nrow = 100, ncol = 10*t_mult), -->
<!-- prior_only = 1, -->
<!-- gp_theta_mu_pr = 0.3, -->
<!-- gp_theta_sigma_pr = 0.1, -->
<!-- gp_sigma_mu_pr = 0.1, -->
<!-- gp_sigma_sigma_pr = 0.1, -->
<!-- ind_scale_mu_pr = 0, -->
<!-- ind_scale_sigma_pr = 0.1, -->
<!-- gp_m_mu_pr = 0.3, -->
<!-- gp_m_sigma_pr = 0.01, -->
<!-- psi_pr = 1, -->
<!-- psi_y = rep(0,100) -->
<!-- ) -->
<!-- fit_gen <- mod$sample( -->
<!-- data = datalist, -->
<!-- seed = 123, -->
<!-- chains = 4, -->
<!-- parallel_chains = 4, -->
<!-- refresh = 100, -->
<!-- iter_warmup = 250, -->
<!-- iter_sampling = 1000, -->
<!-- # fixed_param = TRUE -->
<!-- ) -->
<!-- theta <- fit_gen %>% spread_draws(theta[n]) %>% sample_draws(ndraws = 1, seed = 789) %>% pull(theta) -->
<!-- psi <- fit_gen %>% spread_draws(psi_pred[n]) %>% sample_draws(ndraws = 1, seed = 789) %>% pull(psi_pred) -->
<!-- y <- fit_gen %>% spread_draws(y_pred[n,t]) %>% sample_draws(ndraws = 1, seed = 789) %>% pull(y_pred) %>% matrix(nrow = 10*t_mult, ncol =100) %>% t() -->
<!-- datalist <- list(N = 100, -->
<!-- T = 10*t_mult, -->
<!-- # theta = rnorm(100,0,1), -->
<!-- # psi = rnorm(100,0,1), -->
<!-- y = y, -->
<!-- prior_only = 0, -->
<!-- gp_theta_mu_pr = 0, -->
<!-- gp_theta_sigma_pr = 1, -->
<!-- gp_sigma_mu_pr = 0, -->
<!-- gp_sigma_sigma_pr = 1, -->
<!-- ind_scale_mu_pr = 0, -->
<!-- ind_scale_sigma_pr = 1, -->
<!-- gp_m_mu_pr = 0.3, -->
<!-- gp_m_sigma_pr = 1, -->
<!-- psi_pr = 1, -->
<!-- psi_y = psi -->
<!-- ) -->
<!-- fit_rec <- mod$sample( -->
<!-- data = datalist, -->
<!-- seed = 123, -->
<!-- chains = 4, -->
<!-- parallel_chains = 4, -->
<!-- refresh = 100, -->
<!-- iter_warmup = 250, -->
<!-- iter_sampling = 1000, -->
<!-- # fixed_param = TRUE -->
<!-- ) -->
<!-- fit_rec$summary() -->
<!-- merged_df <- merge_prior_post(fit_gen,fit_rec,gp_theta, gp_sigma,gp_m) -->
<!-- merged_df %>% -->
<!-- # filter(model == "prior") %>% -->
<!-- ggplot(aes(y = .variable, x = .value)) + -->
<!-- stat_dotsinterval(aes(group = model, -->
<!-- slab_colour = model, -->
<!-- slab_fill = model, -->
<!-- interval_colour = model, -->
<!-- point_colour = model)) + -->
<!-- facet_wrap(~.variable, scales = "free_x") -->
<!-- ``` -->
<!-- Doesn't appear to improve things... -->
<!-- ```{r, eval=FALSE} -->
<!-- fit_gen %>% spread_draws(theta[n]) %>% ggplot(aes(x = theta)) + stat_dots(quantiles = 100) -->
<!-- fit_gen %>% spread_draws(gp_theta_mu) %>% ggplot(aes(x = gp_theta_mu)) + stat_dots(quantiles = 100) -->
<!-- fit_gen %>% spread_draws(psi_pred[n]) %>% ggplot(aes(x = psi_pred)) + stat_dots(quantiles = 100) -->
<!-- hist(theta) -->
<!-- hist(psi) -->
<!-- # fit_gen$summary() -->
<!-- hist(y) -->
<!-- # fit_gen %>% spread_draws(psi_pred[n]) %>% sample_draws(ndraws = 1, seed = 789) %>% select(n,psi_pred) %>% left_join(y_tot, by = "n") %>% ggplot(aes(x = psi_pred, y = total)) + geom_point() + geom_smooth(method = "lm") -->
<!-- # -->
<!-- # pull(y_pred) %>% matrix(nrow = 10, ncol =100) %>% t() -->
<!-- # -->
<!-- # hist(y_tot$total) -->
<!-- fit_rec %>% spread_draws(gp_theta) %>% mcmc_trace() -->
<!-- fit_rec %>% spread_draws(theta[n]) %>% ggplot(aes(x = theta)) + stat_dots(quantiles = 100, fill = "red", alpha = 0.5) + stat_dots(data = as.data.frame(theta), aes(x = theta), quantiles = 100, fill = "blue", alpha = 0.5, inherit.aes = TRUE) + ggtitle("PPC, Red = generated/estimated, Blue = real/ground truth") -->
<!-- fit_rec %>% spread_draws(y_pred[n,t]) %>% sample_draws(ndraws = 1, seed = 789) %>% ggplot(aes(x = y_pred)) + stat_dots(quantiles = 100, fill = "red", alpha = 0.5) + stat_dots(data = as.data.frame(as.vector(y)), aes(x = `as.vector(y)`+0.1), quantiles = 100, fill = "blue", alpha = 0.5, inherit.aes = TRUE) + ggtitle("PPC Y, Red = generated/estimated, Blue = real/ground truth") -->
<!-- # fit_rec %>% spread_draws(psi_pred[n]) %>% ggplot(aes(x = psi_pred)) + stat_dots(quantiles = 100, fill = "red", alpha = 0.5) + stat_dots(data = as.data.frame(psi), aes(x = psi), quantiles = 100, fill = "blue", alpha = 0.5, inherit.aes = TRUE) + ggtitle("PPC, Red = generated/estimated, Blue = real/ground truth") -->
<!-- # Merge the two dfs to show real and estimated params side by side. -->
<!-- # bayesplot::mcmc_trace(fit_rec$draws(), pars = "gp_sigma") -->
<!-- fit_rec %>% -->
<!-- gather_draws(gp_theta, gp_sigma) %>% ungroup() %>% ggplot(aes(y = .variable, x = .value)) + stat_eye() + facet_wrap(~.variable, scales = "free_x") -->
<!-- fit_gen %>% gather_draws(gp_theta, gp_sigma) %>% ggplot(aes(y = .variable, x = .value)) + stat_eye() + facet_wrap(~.variable, scales = "free_x") -->
<!-- merge_prior_post <- function(gen,rec,...){ -->
<!-- df1 <- gen %>% -->
<!-- gather_draws(...) %>% ungroup() %>% mutate(model = "ground truth") -->
<!-- df2 <- rec %>% -->
<!-- gather_draws(...) %>% ungroup() %>% mutate(model = "posterior") -->
<!-- df <- bind_rows(df1,df2) %>% mutate(model = as.factor(model)) -->
<!-- return(df) -->
<!-- } -->
<!-- ``` -->
<!-- ## Will this work for learning rate estimation? -->
<!-- ```{r, eval=FALSE} -->
<!-- fit_gen$summary() -->
<!-- fit_gen %>% spread_draws(theta[n]) %>% ggplot(aes(x = theta)) + stat_dots(quantiles = 100) -->
<!-- fit_gen %>% spread_draws(psi_pred[n]) %>% ggplot(aes(x = psi_pred)) + stat_dots(quantiles = 100) -->
<!-- theta <- fit_gen %>% spread_draws(theta[n]) %>% sample_draws(ndraws = 1, seed = 789) %>% pull(theta) -->
<!-- hist(theta) -->
<!-- # psi <- fit_gen %>% spread_draws(psi_pred[n]) %>% sample_draws(ndraws = 1, seed = 789) %>% pull(psi_pred) -->
<!-- # hist(psi) -->
<!-- fit_gen$summary() -->
<!-- y_tot <- fit_gen %>% spread_draws(y_pred[n,t]) %>% sample_draws(ndraws = 1, seed = 789) %>% ungroup(t) %>% summarise(total = sum(y_pred)) -->
<!-- fit_gen %>% spread_draws(psi_pred[n]) %>% sample_draws(ndraws = 1, seed = 789) %>% select(n,psi_pred) %>% left_join(y_tot, by = "n") %>% ggplot(aes(x = psi_pred, y = total)) + geom_point() + geom_smooth(method = "lm") -->
<!-- pull(y_pred) %>% matrix(nrow = 10, ncol =100) %>% t() -->
<!-- hist(y_tot$total) -->
<!-- for (i in c(0.1,0.3,0.5,0.7)){ -->
<!-- datalist <- list(N = 100, -->
<!-- theta = rnorm(100,0,1), -->
<!-- psi = rnorm(100,0,1), -->
<!-- prior_only = 1, -->
<!-- gp_theta_mu_pr = 5, -->
<!-- gp_theta_sigma_pr = 1, -->
<!-- gp_sigma_mu_pr = 2, -->
<!-- gp_sigma_sigma_pr = 1, -->
<!-- gp_m_mu_pr = i, -->
<!-- gp_m_sigma_pr = 0.1, -->
<!-- psi_pr = 0.1 -->
<!-- ) -->
<!-- fit_gen <- mod$sample( -->
<!-- data = datalist, -->
<!-- seed = 123, -->
<!-- chains = 4, -->
<!-- parallel_chains = 4, -->
<!-- refresh = 100, -->
<!-- iter_warmup = 250, -->
<!-- iter_sampling = 1000 -->
<!-- ) -->
<!-- theta <- fit_gen %>% spread_draws(theta_pred[n]) %>% sample_draws(ndraws = 1, seed = 789) %>% select(!starts_with(".")) -->
<!-- psi <- fit_gen %>% spread_draws(psi_pred[n]) %>% sample_draws(ndraws = 1, seed = 789) %>% select(!starts_with(".")) -->
<!-- df <- left_join(theta,psi,by = "n") -->
<!-- assign(paste("p",i,sep = "_"), -->
<!-- ggplot(df, aes(x = theta_pred, y = psi_pred)) + -->
<!-- geom_point() + -->
<!-- geom_smooth(method = "lm", se = FALSE) + -->
<!-- ggtitle(paste("m set at",i)) -->
<!-- ) -->
<!-- } -->
<!-- p_0.1 + p_0.3 + p_0.5 + p_0.7 -->
<!-- ``` -->
<!-- ```{r, eval=FALSE} -->
<!-- ``` -->
<!-- ```{r, eval=FALSE} -->
<!-- inv_logit_scaled(standardise(c(1,9,4,5,6,3,3,3,3,3,3,3,3,3))) -->
<!-- inv_logit_scaled(-1) -->
<!-- normalize(c(1,9,4,5,6,3,3,3,3)) - 0.5 -->
<!-- standardise(c(1,9,4,5,6,3,3,3,3,3,3,3,3,3)) -->
<!-- # Define the error function (erf) -->
<!-- erf <- function(x) { -->
<!-- s <- ifelse(x >= 0, 1, -1) -->
<!-- x <- abs(x) -->
<!-- t <- 1 / (1 + 0.5 * x) -->
<!-- y <- 1 - (((((1.061405429 * t - 1.453152027) * t + 1.421413741) * t - 0.284496736) * t + 0.254829592) * t) * exp(-x * x) -->
<!-- return(s * y) -->
<!-- } -->
<!-- # Define the complementary error function (erfc) -->
<!-- erfc <- function(x) { -->
<!-- return(1 - erf(x)) -->
<!-- } -->
<!-- # Define the phi_approx function using erfc -->
<!-- phi_approx <- function(x) { -->
<!-- if (x > 0) { -->
<!-- return(0.5 * erfc(-x / sqrt(2))) -->
<!-- } else { -->
<!-- return(0.5 * erf(x / sqrt(2)) + 0.5) -->
<!-- } -->
<!-- } -->
<!-- phi_approx(-2) -->
<!-- # Test the functions -->
<!-- x <- seq(-5, 5, length.out = 100) -->
<!-- x <- rbeta(100,10,100) -->
<!-- x <- rnorm(100,5,2) -->
<!-- x <- 5 -->
<!-- # hist(x) -->
<!-- y_erf <- sapply(x, erf) -->
<!-- # hist(y_erf) -->
<!-- y_erfc <- sapply(x, erfc) -->
<!-- # hist(y_erfc) -->
<!-- y_phi <- sapply(x, phi_approx) -->
<!-- # hist(y_phi) -->
<!-- # Plot the results -->
<!-- # plot(x, y_erf, type = "l", col = "blue", lwd = 2, ylim = c(-1.2, 1.2), xlab = "x", ylab = "Value", main = "Error Function and Complementary Error Function") -->
<!-- # lines(x, y_erfc, col = "red", lwd = 2) -->
<!-- # legend("topleft", legend = c("erf(x)", "erfc(x)"), col = c("blue", "red"), lwd = 2) -->
<!-- plot(x, y_phi, type = "p", col = "green", lwd = 2, ylim = c(0, 1), xlab = "x", ylab = "Phi Approximation", main = "Phi Approximation Function") -->
<!-- ``` -->
<!-- ## Will this work for learning rate estimation? -->
<!-- ```{r, eval=FALSE} -->
<!-- ppc_plot <- function(df){ -->
<!-- df %>% ggplot(aes(y = .variable, x = .value, group = model, colour = model, fill = model)) + stat_eye() + facet_wrap(~.variable, scales = "free_x") -->
<!-- } -->
<!-- merged_df <- merge_prior_post(fit_gen,fit_rec,gp_theta, gp_sigma) -->
<!-- p1 <- merged_df %>% -->
<!-- filter(model == "prior") %>% -->
<!-- ggplot(aes(y = .variable)) + stat_eye(aes(x = .value), fill = "red", colour = "pink", alpha = 0.3) + facet_wrap(~.variable, scales = "free_x") -->
<!-- p1 + stat_eye(data = merged_df %>% filter(model == "posterior"), aes(x = .value), fill = "blue", colour = "cyan", alpha = 0.8) -->
<!-- merged_df %>% -->
<!-- filter(.variable == "gp_theta") %>% -->
<!-- ggplot() + -->
<!-- stat_slabinterval(aes(x = .value)) + -->
<!-- # stat_halfeye(aes(x = .value)) + -->
<!-- facet_wrap(~model, scales = "free_x") -->
<!-- fit_gen -->
<!-- mutate(across(c(gp_theta, gp_sigma), as.factor)) %>% -->
<!-- ggplot(aes(x = .value, color = .variable)) + -->
<!-- geom_density() -->
<!-- prior_post = data.frame( -->
<!-- prior = distributional::dist_normal(0,1), -->
<!-- posterior = distributional::dist_normal(0.1, 0.5) -->
<!-- ) -->
<!-- prior_post %>% ggplot() + stat_eye(aes(xdist = prior)) -->
<!-- fit_rec %>% gather_draws(gp_theta,gp_sigma) %>% mutate(parameter = as.factor(.variable)) %>% ggplot(aes(y = parameter)) + stat_eye(aes(x = .value)) -->
<!-- + stat_eye(data = ., mapping = aes(y = gp_sigma)) -->
<!-- %>% ggplot(aes(y = as.factor(.variable))) + stat_eye() -->
<!-- fit_rec %>% spread_draws(gp_theta,gp_sigma) %>% ggplot(aes(y = gp_theta)) + stat_eye() + stat_eye(data = ., mapping = aes(y = gp_sigma)) -->
<!-- ggplot(aes(x = gp_theta)) -->
<!-- + stat_dots(quantiles = 100) -->
<!-- fit_psi %>% spread_draws(theta[n]) %>% ggplot(aes(x = theta)) + stat_dots(quantiles = 100) -->
<!-- # theta <- fit_psi %>% spread_draws(theta_pred[n]) %>% posterior::summarise_draws() %>% pull(median) -->
<!-- hist(theta) -->
<!-- fit_rec$summary() -->
<!-- set.seed(1234) -->
<!-- df = tribble( -->
<!-- ~group, ~subgroup, ~value, -->
<!-- "a", "h", rnorm(1000, mean = 5), -->
<!-- "b", "h", rnorm(1000, mean = 7, sd = 1.5), -->
<!-- "c", "h", rnorm(1000, mean = 8), -->
<!-- "c", "i", rnorm(1000, mean = 9), -->
<!-- "c", "j", rnorm(1000, mean = 7) -->
<!-- ) %>% -->
<!-- unnest(value) -->
<!-- df %>% ggplot(aes(y = group, x = value)) + stat_eye() -->
<!-- ``` -->
<!-- ```{r, eval=FALSE} -->
<!-- draws <- fit_psi$draws(format = "df") %>% summarise_draws()draws <stat_eye()- fit_psi$draws(format = "df") %>% summarise_draws() -->
<!-- draws %>% tidybayes::spread_draws() -->
<!-- data("anscombe") -->
<!-- anscombe -->
<!-- fit <- brms::brm( y1 ~ x1 , -->
<!-- data = anscombe, -->
<!-- family = gaussian(), -->
<!-- prior = brms::prior(normal(10,5)), -->
<!-- sample_prior = TRUE, -->
<!-- ) -->
<!-- fit$prior -->
<!-- fit$model -->
<!-- fit$summary() -->
<!-- fit$autocor -->
<!-- summary(fit) -->
<!-- ``` -->
<!-- # Model 1 -->
<!-- ```{r eval=FALSE} -->
<!-- file <- "~/FLARe/FLARe_Modelling/Stan_experimental/embedding.stan" -->
<!-- mod <- cmdstan_model(file) -->
<!-- mod$check_syntax(pedantic = TRUE) -->
<!-- # -->
<!-- # for (p in 1:2){ -->
<!-- # for (l in 1:4){ -->
<!-- # cat("MCMC Underway, LR:",l,"Phi",p) -->
<!-- l <- 1 -->
<!-- p <- 1 -->
<!-- datalist <- list(phases = 2, -->
<!-- n_trials = c(dim(fc_list_gad[[1]])[2],dim(fc_list_gad[[2]])[2]), -->
<!-- n_subjects = dim(fc_list_gad[[1]])[1], -->
<!-- cs_acq = fc_list_gad[[1]], -->
<!-- cs_ext = fc_list_gad[[2]], -->
<!-- er_acq = fc_list_gad[[3]], -->
<!-- er_ext = fc_list_gad[[4]], -->
<!-- us = c(1,0), -->
<!-- n_lr = l, -->
<!-- n_phi = p, -->
<!-- gad = gad_vec_s -->
<!-- ) -->
<!-- fit_prog <- mod$sample( -->
<!-- data = datalist, -->
<!-- seed = 123, -->
<!-- chains = 4, -->
<!-- parallel_chains = 4, -->
<!-- refresh = 500, # print update every 500 iters -->
<!-- iter_sampling = 1000, -->
<!-- iter_warmup = 500 -->
<!-- ) -->
<!-- summ <- fit_prog$summary() -->
<!-- summ -->
<!-- # -->
<!-- # cat("MCMC done, LR:",l,"Phi",p) -->
<!-- # -->
<!-- # fit_prog$save_object(file = paste0("./Fits/fit_prog_lr_",l,"_phi_",p,"_LL.RDS")) -->
<!-- # }} -->
<!-- ``` -->
<!-- Test -->
<!-- ```{r eval=FALSE} -->
<!-- file <- "~/FLARe/FLARe_Modelling/Stan_test/Programatic_multi_lr_LL_test.stan" -->
<!-- mod <- cmdstan_model(file) -->
<!-- mod$check_syntax(pedantic = TRUE) -->
<!-- for (p in 1:2){ -->
<!-- for (l in 1:4){ -->
<!-- cat("MCMC Underway, LR:",l,"Phi",p) -->
<!-- datalist <- list(phases = 2, -->
<!-- n_trials = c(dim(fc_list[[1]])[2],dim(fc_list[[2]])[2]), -->
<!-- n_subjects = dim(fc_list[[1]])[1], -->
<!-- cs_acq = fc_list[[1]], -->
<!-- cs_ext = fc_list[[2]], -->
<!-- er_acq = fc_list[[3]], -->
<!-- er_ext = fc_list[[4]], -->
<!-- us = c(1,0), -->
<!-- n_lr = l, -->
<!-- n_phi = p -->
<!-- ) -->
<!-- fit_prog <- mod$sample( -->
<!-- data = datalist, -->
<!-- seed = 123, -->
<!-- chains = 4, -->
<!-- parallel_chains = 4, -->
<!-- refresh = 500, # print update every 500 iters -->
<!-- iter_sampling = 40, -->
<!-- iter_warmup = 10 -->
<!-- ) -->
<!-- cat("MCMC done, LR:",l,"Phi",p) -->
<!-- fit_prog$save_object(file = paste0("./Fits/fit_prog_lr_",l,"_phi_",p,"_LL.RDS")) -->
<!-- }} -->
<!-- ll <- fit$draws("test") -->
<!-- dim(ll) -->
<!-- ll[10,2,640] -->
<!-- ``` -->
<!-- ## Run through Stan -->
<!-- ```{r stan runthrough, eval = FALSE} -->
<!-- file <- "~/FLARe/FLARe_Modelling/Stan_test/Hierarchical_RL_Gen_Opt_Missing.stan" -->
<!-- mod <- cmdstan_model(file) -->
<!-- mod$check_syntax(pedantic = TRUE) -->
<!-- datalist <- list(phases = 2, -->
<!-- n_trials = c(dim(fc_list[[1]])[2],dim(fc_list[[2]])[2]), -->
<!-- n_subjects = dim(fc_list[[1]])[1], -->
<!-- cs_acq = fc_list[[1]], -->
<!-- cs_ext = fc_list[[2]], -->
<!-- er_acq = fc_list[[3]], -->
<!-- er_ext = fc_list[[4]], -->
<!-- us = c(1,0) -->
<!-- ) -->
<!-- fit <- mod$sample( -->
<!-- data = datalist, -->
<!-- seed = 123, -->
<!-- chains = 4, -->
<!-- parallel_chains = 4, -->
<!-- refresh = 500, # print update every 500 iters -->
<!-- iter_sampling = 20, -->
<!-- iter_warmup = 5 -->
<!-- ) -->
<!-- fit$save_object(file = "./Fits/fit_.RDS") -->
<!-- ``` -->